93 research outputs found

    Benchmarking natural-language parsers for biological applications using dependency graphs

    Get PDF
    BACKGROUND: Interest is growing in the application of syntactic parsers to natural language processing problems in biology, but assessing their performance is difficult because differences in linguistic convention can falsely appear to be errors. We present a method for evaluating their accuracy using an intermediate representation based on dependency graphs, in which the semantic relationships important in most information extraction tasks are closer to the surface. We also demonstrate how this method can be easily tailored to various application-driven criteria. RESULTS: Using the GENIA corpus as a gold standard, we tested four open-source parsers which have been used in bioinformatics projects. We first present overall performance measures, and test the two leading tools, the Charniak-Lease and Bikel parsers, on subtasks tailored to reflect the requirements of a system for extracting gene expression relationships. These two tools clearly outperform the other parsers in the evaluation, and achieve accuracy levels comparable to or exceeding native dependency parsers on similar tasks in previous biological evaluations. CONCLUSION: Evaluating using dependency graphs allows parsers to be tested easily on criteria chosen according to the semantics of particular biological applications, drawing attention to important mistakes and soaking up many insignificant differences that would otherwise be reported as errors. Generating high-accuracy dependency graphs from the output of phrase-structure parsers also provides access to the more detailed syntax trees that are used in several natural-language processing techniques

    In Vitro Cultivation of 'Unculturable' Oral Bacteria, Facilitated by Community Culture and Media Supplementation with Siderophores

    Get PDF
    Over a third of oral bacteria are as-yet-uncultivated in-vitro. Siderophores have been previously shown to enable in-vitro growth of previously uncultivated bacteria. The objective of this study was to cultivate novel oral bacteria in siderophore-supplemented culture media. Various compounds with siderophore activity, including pyoverdines-Fe-complex, desferricoprogen and salicylic acid, were found to stimulate the growth of difficult-to-culture strains Prevotella sp. HOT-376 and Fretibacterium fastidiosum. Furthermore, pyrosequencing analysis demonstrated increased proportions of the as-yet-uncultivated phylotypes Dialister sp. HOT-119 and Megasphaera sp. HOT-123 on mixed culture plates supplemented with siderophores. Therefore a culture model was developed, which incorporated 15 ÎŒg siderophore (pyoverdines-Fe-complex or desferricoprogen) or 150 ÎŒl neat subgingival-plaque suspension into a central well on agar plates that were inoculated with heavily-diluted subgingival-plaque samples from subjects with periodontitis. Colonies showing satellitism were passaged onto fresh plates in co-culture with selected helper strains. Five novel strains, representatives of three previously-uncultivated taxa (Anaerolineae bacterium HOT-439, the first oral taxon from the Chloroflexi phylum to have been cultivated; Bacteroidetes bacterium HOT-365; and Peptostreptococcaceae bacterium HOT-091) were successfully isolated. All novel isolates required helper strains for growth, implying dependence on a biofilm lifestyle. Their characterisation will further our understanding of the human oral microbiome

    Information retrieval and text mining technologies for chemistry

    Get PDF
    Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European Community’s Horizon 2020 Program (project reference: 654021 - OpenMinted). M.K. additionally acknowledges the Encomienda MINETAD-CNIO as part of the Plan for the Advancement of Language Technology. O.R. and J.O. thank the Foundation for Applied Medical Research (FIMA), University of Navarra (Pamplona, Spain). This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi for useful feedback and discussions during the preparation of the manuscript.info:eu-repo/semantics/publishedVersio

    The oral microbiome – an update for oral healthcare professionals

    Get PDF
    For millions of years, our resident microbes have coevolved and coexisted with us in a mostly harmonious symbiotic relationship. We are not distinct entities from our microbiome, but together we form a 'superorganism' or holobiont, with the microbiome playing a significant role in our physiology and health. The mouth houses the second most diverse microbial community in the body, harbouring over 700 species of bacteria that colonise the hard surfaces of teeth and the soft tissues of the oral mucosa. Through recent advances in technology, we have started to unravel the complexities of the oral microbiome and gained new insights into its role during both health and disease. Perturbations of the oral microbiome through modern-day lifestyles can have detrimental consequences for our general and oral health. In dysbiosis, the finely-tuned equilibrium of the oral ecosystem is disrupted, allowing disease-promoting bacteria to manifest and cause conditions such as caries, gingivitis and periodontitis. For practitioners and patients alike, promoting a balanced microbiome is therefore important to effectively maintain or restore oral health. This article aims to give an update on our current knowledge of the oral microbiome in health and disease and to discuss implications for modern-day oral healthcare

    Cross-lingual C*ST*RD: English access to Hindi information

    Get PDF
    We present C*ST*RD, a cross-language information delivery system that supports cross-language information retrieval, information space visualization and navigation, machine translation, and text summarization of single documents and clusters of documents. C*ST*RD was assembled and trained within 1 month, in the context of DARPA’s Surprise Language Exercise, that selected as source a heretofore unstudied language, Hindi. Given the brief time, we could not create deep Hindi capabilities for all the modules, but instead experimented with combining shallow Hindi capabilities, or even English-only modules, into one integrated system. Various possible configurations, with different tradeoffs in processing speed and ease of use, enable the rapid deployment of C*ST*RD to new languages under various conditions

    Adaptation of Data and Models for Probabilistic Parsing of Portuguese

    No full text
    • 

    corecore